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(54) Flexible viterbi decoder for wireless applications 

(57) A Viterbi decoder system is provided in accord- 
ance with the present invention. The decoder system 
includes a State Metric Update unit including a state 
metric memory and a cascaded Add/Compare/Select 
(ACS) unit. The cascaded ACS unit comprises a plural- 
ity of serially coupled ACS stages for performing a plu- 
rality of ACS operations in conjunction with the state 
metric memory. An ACS stage is operable to identify a 
plurality of path decisions and communicate the identi- 
fied path decisions to a next ACS stage coupled thereto. 
A Traceback unit is provided for storing a set of accumu- 
lated path decisions in a traceback memory associated 
therewith, and performing a traceback on the set of 
accumulated path decisions. The path decisions associ- 
ated with the ACS stage and the next ACS stage are 
accumulated as a set during the ACS operations before 
being written to the traceback memory, thereby minimiz- 
ing accesses to the traceback memory. 
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Description 

[0001] The present invention relates generally to Viterbi coding systems, and in particular, but not exclusively, to a 
system and method for providing flexible, high-speed, and low-power decoding (based on the Viterbi algorithm) of con- 

5 volutional codes for wireless and other type communication applications. 

[0002] Modern society has witnessed a dramatic increase in wireless communications. Wireless t chnology (e.g., 
satellite, microwave) has provided a system whereby cellular and other communications have become ah ever increas- 
ing necessity. In order to satisfy the demand for increased and reliable communications capability, more flexible, pow- 
erful, and efficient systems are needed. In particular, forward error correction systems must be improved to satisfy 

10 society's need for increased wireless communications. 

[0003] Forward error correction systems are a necessary component in many of today's communications systems. 
These systems generally add robustness to communications systems by substantially correcting errors that may occur 
during transmission and reception of wireless data. This is particularly true for systems which are limited in power 
and/or bandwidth. Often, convolutional coding is a key part in such forward error correction systems. In general, convo- 

15 lutional coding systems introduce redundancy data into a wireless data transmission so that random errors occurring in 
the transmission have a high probability of being corrected. Consequently, decoding systems (e.g., a Viterbi decoder) 
must be in place to decode the convolutionally coded data upon reception of the transmitted data, and thereby recon- 
struct the actual data transmission. 

[0004] Referring to prior art Fig. 1, a wireless communications system 10 illustrates a particular challenge pre- 
20 sented to a conventional wireless system. A transmitter 20 directs a communications signal 24 to a satellite system 30. 
The satellite system 30, upon receiving the communications signal 24, then directs a communications signal 24a to a 
ground base station 32 wherein the signal is processed for the intended destination. Anytime during transmission of the 
communications signal 24 and 24a, noise 34 may corrupt a portion of the transmission (cause an error), thereby caus- 
ing improper signal reception at the base station 32. If error correction systems were not provided, the signal would 
25 likely have to be re-transmitted in order to be properly received at the base station 32. Thus, inefficiencies and 
increased costs are likely results. 

[0005] Fig. 2 illustrates a prior art error correction system 40 employing convolutional encoding and Viterbi decod- 
ing for increasing the likelihood that transmission signals may be properly communicated despite the presence of noise. 
Input data 42 (e.g., audio, video, computer data) is input to a convolutional encoder 44. Encoded data is provided as a 

30 sequence of data bits 46 (also referred to as encoded symbols), which are composed of actual and redundantly added 
data, and transmitted over a communications link 48. The communications link 48 may introduce noise into the data 
transmission and therefore, the transmitted data bits 46 may be corrupted by the time they reach their destination. Each 
received (and possibly corrupted) data bit 46a may be processed by a Viterbi decoder 50 to provide decoded output 
data 52. The Viterbi decoder 50, (based upon the Viterbi algorithm which was first proposed by Andrew Viterbi in 1967), 

35 provides a decoding system wherein the input data 42 that was originally transmitted may be determined to a high prob- 
ability even though noise may have affected some of the transmitted (convoluted) data 46. In general, the input data 42 
may be determined by computing a most likely sequence.for the input data. 42 which is derivedfrom the convolutionally 
encoded data 46a. 

[0006] Convolutional encoding is performed by convolving (redundantly adding) input data bits 42 via an encoder 
40 with one or more previous input bits 42. An example of a conventional rate 1/2, constraint length 9, convolutional 
encoder 44 is shown in prior art Fig. 3. Input bits 42 are input to a series of delay elements 60, such as a shift register 
44a, that provides outputs X° through X 8 at various points. The outputs X° through X 8 may be combined by an XOR 
function 62a and 62b to generate an encoded symbol set G 0 and G-j . The outputs, X° through X 8 , which are connected 
(tapped) to the XOR function 62a and 62b, will determine an output code sequence of G 0 and G 1 for a given input data 
45 sequence 42. The input to output relationship may be described by a code polynomial for the encoder outputs G 0 and 
G 1 . For example, for the encoder 44 shown in Fig. 3, the code polynomial is given as: 

G 0 = X° + X 1 +X 3 + X 6 +X 8 = 1 + X 1 +X 3 +X 6 + X 8 ; and 

so G 1 = X°+X 2 +X 3 +X 7 +X 8 = 1+ X 2 + X 3 +X 7 + X 8 

[0007] Note: Texas Instruments Applications Report SPRA071, Viterbi Decoding Techniques in the TMS 320C54x 
Family, 1996, provides further details on convolutional encoders and code polynomials and is hereby incorporated by 
reference in its entirety. 

55 [0008] As shown, the encoder 44 of Fig. 3, generates the encoded symbol set, G 0 and G^ for every input bit 42. 
Thus, the encoder has a rate of 1/2 (1 input / 2 output) . The constraint length (K) represents the total span of combina- 
tions employed by the encoder which is a function of the number of delay elements 60. A constraint length K = 9 implies 
there are 2 < 9 " 1 ) = 256 encoder states (the ninth bit is the input bit). These states are represented as state SO (binary 
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00000000) to state S255 (binary 11111111). 

[0009] Convolutionally encoded data may be decoded according to the Viterbi algorithm. The basis of the Viterbi 
algorithm is to decode conventionally encoded data by employing knowledge (e.g., mimic the encoder) of the possible 
encoder 44 output state transitions from one given state to the next based on the dependance of a given data state on 

5 past input data 42. The allowable state transitions are typically represented by a trellis diagram (similar to a conven- 
tional state diagram) which provides possible state paths for a received data sequence based upon the encoding proc- 
ess of the input data 42. The trellis structure is determined by the overall structure and code polynomial configuration 
of the convolutional encoder 44 described above. The Viterbi algorithm provides a method for minimizing the number 
of state paths through the trellis by limiting the paths to those with the highest probability of matching the transmitted 

10 encoder 44 output sequence with the received data sequence at the decoder. 

[0010] Fig. 4 is an illustration of a portion of a trellis 66 and depicts a basic Viterbi algorithm butterfly computation. 
Four possible encoder transitions 70a through 70d from present state nodes 68a and 68b, to next state nodes 68c and 
68d are illustrated. As shown, two transition paths (branches) exist from each present state node 68a and 68b to each 
next state node 68c and 68d. The Viterbi algorithm provides a process by which the most likely of two possible transition 

15 paths may be determined and subsequently selected as a portion of a "survivor" path. For example, branches 70a and 
70b provide two possible transition paths to the next state node 68c. Likewise, branches 70c and 70d provide two pos- 
sible transition paths to the next state node 68d. The transition paths 70a through 70d provide the possible directions 
to the next most likely states that may be generated by the convolutional encoder 44 as directed by the input bits 42. 
Once a sequence of survivor paths have been determined (through a plurality of butterfly stages), the most probable 

20 data input sequence 42 to the convolutional encoder 44 can be reconstructed, thus decoding the convolutionally 
encoded data. 

[0011] The decoder operation generally includes the steps of a branch metric computation, an Add/Com- 
pare/Select (ACS) operation, and a traceback operation. The branch metric computation provides a measurement of 
the likelihood that a given transition path from a present state to a next state is correct. In the branch metric computa- 
25 tion, the received data values, typically an 8 or 16 bit digital value representing the magnitude of voltage or current of 
an input signal, are processed to determine a Euclidean or equivalent distance (see Tl reference noted above for further 
details) between the received data values and all possible actual data values, uncorrupted by noise, which may result 
from a state transition from a present state to a next state. 

[0012] Thus, decoding data signals from a convolutional decoder of rate 1/R with a constraint length of K requires 
30 determining a total of 2 R branch metric values for each encoded symbol input to the decoder. As described herein, the 
set of 2 R branch metric values is defined as the complete branch metric set for a particular received input symbol. 
[0013] In the next decoder step, previously computed branch metric values for all possible state transitions are 
processed to determine an "accumulated distance" for each of the paths to the next state. The path with the minimum 
or maximum distance, depending on the implementation, (i.e., maximum probability) is then selected as the survivor 
35 path. This is known as the Add/Compare/Select, or ACS operation. The ACS operation can be broken into two basic 
operations. An Add operation, or path metric computation, and the Compare/Select operation. The path metric Add 
operation is the accumulation of present state values (initialized by a user at the start of Viterbi processing and carried 
forward from state to state) with the branch metric values for a received data input sequence. The Compare-Select 
operation computes and compares two values from the Add operation to determine the minimum value (or maximum 
40 value, depending on the implementation) and stores one or more "traceback bits" to indicate the selected survivor path. 
[0014] The final decoding step is the traceback operation. This step traces the maximum likelihood path through the 
trellis of state transitions, as determined by the first two steps, and reconstructs the most likely path through the trellis 
to extract the original data input to the encoder 44. 

[0015] Conventionally, digital signal processors (DSPs) have been employed to handle various Viterbi decoding 
45 applications. Many DSPs have special instructions specifically designed for the Viterbi decoding algorithm. For exam- 
ple, many of today's cellular phone applications involve DSP solutions. However, when a code such as the code 
described above (K=9) is employed in conjunction with high data rates (384 kbits/sec - 2 M bits/sec), high computation 
rates are generally required. This may require 49 x 10 6 to 256 x 10 6 Viterbi ACS operations per second. These comput- 
ing operations are multiplied even more when multiple voice/data channels are processed by a DSP in a cellular base 
so station, for example. Thus, Viterbi decoding may consume a large portion of the DSPs computational bandwidth. Con- 
sequently, higher performance systems are necessary to meet increased computational demands. 
[0016] Another challenge faced by conventional decoding systems is the need to decode various forms of convolu- 
tional codes. Many decoding systems are hard-wired and/or hard-coded to deal with a particular type of convolutional 
code. For example, the constraint length K, described above, may vary (e.g., K = 9,8,7,6,5, etc.) from one encoding sys- 
55 tern to the next. Also, the code polynomials mentioned above may vary from system to system, even though the con- 
straint length may remain unchanged. A hard-wired and/or hard-coded decoding system may need to be re-designed 
in order to meet these different encoding requirements. Various other parameters also may need to be varied in the 
encoding/decoding process as well. Therefore, it would be desirable for a decoding system to provide a high degree of 
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flexibility in processing various forms of encoded data. 

[0017] Still another challenge faced by conventional decoding systems are increased power requirements. As data 
is decoded at higher rates, computational demands of the decoding system often, times increase the power require- 
ments of the decoders (e.g., DSPs, processing systems) . Many conventional systems require extensive register and 
5 memory accesses during the decoding process. This generally increases power consumed in decoders and generally 
lowers d coder performance (e.g., speed, reliability). 

[0018] In view of the above problems associated with conventional decoding systems, it would therefore be desira- 
ble to have a Viterbi decoding system and/or method which provides a high degree of flexibility with increased decoding 
performance and with lower power requirements. 
10 [0019] An aspect of the present invention is directed toward a VLSI architecture for a Viterbi decoder for wireless 
or other type applications which may operate within a programmable DSP system, and which provides flexibility, low- 
power, and high data throughput rates. The architecture is intended to provide a cost effective solution for multiple appli- 
cation areas, including cellular basestations and mobile handsets. 

[0020] The decoder preferably operates on a plurality of common linear (single shift register) convolutional codes 

15 of rate 1/n and constraint length, K = 9 (256 states), or less, and is capable of a substantially high throughput rates of 
2.5 Mbps in the case of K=9. In particular, high data throughput rates are achieved by a cascaded ACS system which 
operates over several trellis stages simultaneously. Additionally, the cascaded ACS performs a partial pretraceback 
operation, over multiple trellis stages, during the ACS operation. This increases system throughput by reducing the 
complexity of a final traceback operation to retrieve decoded output bits and substantially decreasing the number of 

20 memory accesses associated therewith. 

[0021] The high data throughput rate enables the decoder to handle substantially hundreds of voice channels for 
next generation cellular basestations. This may greatly reduce the number of DSP processors a system requires and 
likely lowers system costs of a purely DSP based system. These types of data rates and codes are employed exten- 
sively in wireless applications of many varieties from satellite communications to cellular phones. 

25 [0022] Since there are variations between particular encoding applications, and within some decoding applications 
with regard to the exact structure of the Viterbi decoding problem, flexibility in the decoding architecture is provided. In 
particular, the cascaded ACS system described above may be configured to operate on variable constraint length 
codes by operating over multiple stages of the trellis for K=9. This is accomplished by operating on a sub-trellis archi- 
tecture in conjunction with a state metric memory. For the cases of K < 9, particular ACS stages are bypassed selec- 

30 tively. 

[0023] Embodiments of the present invention incorporate a high degree of flexibility to enable the decoder to be 
employed in many variable situations. The decoder flexibility includes variable constraint lengths, user supplied polyno- 
mial code coefficients, code rates, and traceback settings such as convergence distance and frame structure. 
[0024] A DSP interface is provided which is memory mapped to enable high data rate transfers between the 
35 decoder of the present invention and a DSP. This greatly reduces the processing burden of the DSP. and provides for a 
more powerful system overall. Significant buffering is also provided within the decoder. An embodiment of the present 
invention also supports intelligent data transfer and synchronization mechanisms, including various trigger signals such 
as: execution done, input buffer low, and send/receive block transfer completed. 

[0025] Additionally, an embodiment of the present invention has been designed to operate at high data rates and 
40 to be highly energy efficient, (i.e., low power). Low power operations are accomplished by minimizing register opera- 
tions and memory accesses, and by paralleling and streamlining particular aspects of the decoding process. For exam- 
ple, the ACS operation described above performs pretraceback operations during the ACS operation. Additionally, 
memory accesses are reduced by operating over multiple stages of the trellis simultaneously. 

[0026] To seek and attempt the accomplishment of the foregoing and related ends, an embodiment of the invention 
45 comprises the features hereinafter fully described. The following description and the annexed drawings set forth in 
detail certain illustrative embodiments of the invention. These embodiments are by way of example only and are but a 
few of the various ways in which embodiments of the invention may be implemented. Other advantages and novel fea- 
tures of the invention will become apparent from the following detailed description of embodiments of the invention 
when considered in conjunction with the drawings, of which: 

50 

Figure 1 is a block diagram of a prior art wireless communications system; 

Figure 2 is a block diagram of a prior art convolutional encoder and Viterbi decoder; 

Figure 3 is a schematic block diagram of a prior art convolutional encoder; 

Figure 4 is a prior art Viterbi algorithm butterfly structure illustrating possible encoder transitions from present state 
55 nodes to next state nodes; 

Figure 5 is a 4 stage, 16 state trellis diagram for Viterbi decoding in accordance with an embodiment of the present 
invention; 

Figures 6 is a schematic block diagram of a Viterbi decoder in accordance with an embodiment of the present 
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invention; 

Figure 7 is a schematic block diagram of a cascaded ACS unit for a Viterbi decoder in accordance with an embod- 
iment of the present invention; 

Figure 8 is a more detailed schematic block diagram of an AGS unit for a Viterbi decoder in accordance with an 
5 embodiment of the present invention; 

Figure 9 is a schematic block diagram of branch metric selection unit for a Viterbi decoder in accordance with an 
embodiment of the present invention; 

Figure 10 is a schematic block diagram of a Traceback Unit in accordance with an embodiment of the present 
invention; and 

10 Figure 1 1 is a flow chart diagram illustrating a methodology for Viterbi decoding in accordance with the present an 
embodiment of the present invention. 

[0027] Embodiments of the present invention will now be described with reference to the drawings, wherein like ref- 
erence numerals are used to refer to like elements throughout. 

15 [0028] In: accordance with an embodiment of the present invention, a Viterbi decoder 110 (Fig. 6) decodes a plural- 
ity of trellis stages (Fig. 5) simultaneously via a cascaded ACS 122 (Fig. 7). This substantially reduces memory access 
cycles, and thereby lowers power requirements and increases system throughput. During the cascaded ACS operation, 
a partial traceback of the trellis occurs simultaneously during the ACS operation via a unique register exchange archi- 
tecture (Fig. 8). This also, lowers power requirements and increases system throughput. Additionally, variable constraint 

20 length codes, may be solved via bypass systems implemented within the cascaded ACS 122, and a plurality of user 
supplied code polynomials may be employed (Fig. 9) to decode various encoding structures. This provides a substantial 
degree of flexibility in the decoder 110. 

[0029] Referring initially to Fig. 5, a trellis diagram is shown in accordance with an exemplary embodiment of the 
present invention. The trellis corresponds to a convolutional encoder from a single shift register code having sixteen 

25 states (K=5). The sixteen states are represented by state indices (indexes) 0 through 15 (e.g., 100a, 100b, 100c) which 
are shown in columns C1 through C5 and which correspond to particular points in time (e.g., transitions from one 
encoder state to the next). The transitions between columns may be referred to as stages (e.g., stage 1, stage 2 etc.). 
Each stage provides an input to output bit mapping from left to right from a previous state (left) to a present or next state 
(right), and a set of branches (e.g., 102a, 102b) represent possible bit transitions between stages. The input to output 

30 (stage to stage) bit mapping is provided by a set of code polynomials which describe the encoder configuration and is 
supplied by a user. 

[0030] The state indices are generated as pointers to memory locations for holding an accumulated state metric 
(described in more detail below) from previous stages. It is noted that each state index in each column may only tran- 
sition (provide outputs) to two other defined state indices in the next stage to the right in the diagram. Likewise, each 

35 state index in a column to the right of a column may only receive two inputs from defined state indices on the left. For 
example, state 8 in column C1 may only transition to state 0 or state 1 in each of the columns C2, C3, C4, etc. In a sim- 
ilar manner, state 12 in any column, may only receive inputs from state 6 or state 14 in columns C1, C2, G3 etc. 
[0031] As will be described in more detail below, a likely path through the trellis, which ultimately determines the 
original input data to the encoder, maybe determined by performing an ACS (Add/Compare/Select) operation for every 

40 set of branches entering each state. A set of branch metrics (described in more detail below) are added (the Add portion 
of ACS) to the accumulated state metrics from the previous stage (initially, the accumulated state metrics in column C1 
may be reset to a desired predetermined value, e.g., a value of 0 or a very large number). Then, a branch is chosen 
(the Compare and Select portion of ACS) from each ACS operation based on which branch will yield the lowest or pref- 
erably the highest accumulated state metric for the next stage. After a number of stages have been solved via the ACS 

45 operation, the chosen branches will begin to converge on an overall path. By tracing back (described in more detail 
below) a path through each stage from the selected branches, the decoded data may be determined. 
[0032] A top-level schematic block diagram of a Viterbi decoding system 1 10 in accordance an embodiment of the 
present invention is shown in Fig. 6 and generally consists of two primary units: A State Metric Update Unit 120, and a 
Traceback Unit 130. The State Metric Update Unit 120 includes a cascaded ACS 122, a state metric memory 126, and 

so a branch metric selection unit 138 for receiving branch metrics 134 from the Traceback unit 130 and synchronizing the 
branch metrics 134 with the ACS 122. 

[0033] The cascaded ACS 122, in conjunction with the state metric memory 126, determines a set of accumulated 
state metrics (SM) 125, which also may be referred to as path metrics, for each stage in the trellis as the decoding proc- 
ess moves forward in time. The cascaded ACS 122 performs additions, subtractions, and comparisons, with a set of 
55 incoming branch metrics 134 and selects new state metrics from which path decision values 124 are determined. This 
is accomplished by evaluating a metric at each state to determine which one of two incoming branches provides the 
smallest or preferably largest next state metric 125' depending on the particular algorithm implementation desired. The 
evaluation is performed by the ACS 122, by adding branch metrics 134 to the state metric memory 126 which is 
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addressed by the state indices from which the branch originates. As will be described in more detail below, branch met- 
rics 134 (preferably determined by a peripheral DSP 140) are sets of numbers, one set per trellis stag . which are 
derived from the convoluted input data to the decoder and are typically distance measures between th receiver' s 
(input) soft decisions and known modulation points. Other forms of branch metric data, however, may b employed and 

5 such data forms ar contemplated as falling within the scope of the present invention. 

[0034] Preferably, an SRAM memory 126 stores the set of state metrics 125 which are continually being read out, 
updated and written back thereto. The path decision values 124 are provided to the Traceback Unit 130 and a memory 
132 associated therewith by the cascaded ACS 122, wherein the path decision values 124 are employed in traceback 
determinations of the decoded data as will be described in more detail below. 

10 [0035] An address and control block 136 directs data through the trellis and provides memory addressing for the 
state metric memory 126. The address and control block 136, which is described in more detail below, is responsible 
for state index generation which is based upon a user supplied constraint length. The address and control block 136 is 
also responsible for synchronizing the branch metrics 134, which are received in the branch metric selection unit 138, 
with the ACS 122. 

15 [0036] The Traceback Unit 130 is the other primary unit in accordance with this embodiment of the present inven- 
tion and serves multiple functions. The Traceback unit 130 stores path decisions 124 received from the cascaded ACS 
122, and performs a traceback therefrom. The process of traceback creates the output (decoded) bits, and provides 
storage for the decoded output and the incoming branch metrics 134. The Traceback Unit 130 preferably contains one 
or more memories 132 for storing such data. 

20 [0037] A unique feature of the decoding system 1 10 is the partitioning of the overall decoding process between the 
decoding system 110 and preferably a DSP 140 to which the system 110 provides support. All of the branch metric 
computations preferably are performed external to the system 110, preferably, in the host DSP 140. Likewise, depunc- 
turing manipulations may also be performed by the DSP 140 (e.g., insertion of null or other compensating values into 
the input stream) . This provides for more user control over these functions (e.g., branch metric computation, depunc- 

25 turing, etc.). 

[0038] The decoder system 110 is flexible in operation. Specifically, it may operate on constraint lengths of 5 
through 9, and process up to 256 states over four trellis stages simultaneously. The system 110 processes the rate 1/2 
and 1/3 cases with arbitrary sets of user supplied code coefficients. Also, the bit rate may be variable (e.g., the decoder 
may operate by detecting a received data frame of a fixed size, regardless of the bit rate). The system 110 may process 

30 framed input data where the tail (input bits inserted to force a particular state) of the data forces a return to state zero 
or the system may run in a continuous decode mode with no forced states. . Certain options for effectively presetting 
state metrics at the start of a frame also are available. For example, a user may desire to set state zero ! s initial metric 
to a largest value and ail other states to a smallest value to force all traceback paths to return to state zero at the start 
of the frame. The convergence distance that the traceback process utilizes before generating output bits also is an 

35 adjustable parameter and is supplied by a user. 

[0039] A DSP interface circuit 144 provides a memory mapping interface to the decoder system 110. The DSP 
interface 144 operates utilizing block data transfers of incoming branch metrics and outgoing decoded bits (shown as 
bus 146). These transfers may be performed employing DMA (or other ) peripheral DSP support. Thus, the bus is uti- 
lized efficiently and minimal interaction is required from the DSP 140. 

40 [0040] Now referring to Fig. 7, a more detailed block diagram of the cascaded ACS unit 122 is shown in accordance 
with an embodiment of the present invention. The ACS unit 122 processes a set of state metrics 125 (from the state 
metric memory 126 of Fig. 6) along with a corresponding set of branch metrics 134a through 134d (collectively referred 
to as 134, and received from the traceback memory 132). This is achieved by processing the set of state metrics 125 
which are carried forward in time, stage to stage, as an accumulated state metric, through the trellis depicted in Fig. 5. 

45 At each stage of the trellis, which correspond to ACS stages 1 50b through 1 56b, accumulated state metrics are updated 
utilizing the branch metric data 134 of the current stage. State metric updates are accomplished by determining the 
optimal branch (identified path decision) from the two possible trellis branches from the previous treNis states. It is noted 
that one ACS operation is provided for each node in a column per trellis stage. For example, in Fig. 5, 16 ACS opera- 
tions are provided for column C2, C3, C4 and C5, therefore, each column includes 16 ACS operations per ACS stage 

50 1 50b, 1 52b, 1 54b and 1 56b of Fig. 7. 

[0041] The optimal branch refers to the branch (identified path) which yields the smallest or preferably largest next 
state metn'c as defined by adding the branch metric 134 to the accumulated state metric 125 of the trellis state from 
which the trellis branch originates. Each trellis branch corresponds to a set of possible output bits, and the branch met- 
ric corresponds to a distance measure from the output bits to the received input data. The output bits are preferably 

55 mapped to a constellation point which is transmitted, and the branch metric is the Euclidian distance between the 
received data and the constellation point. Branch metric computations are well known in the art and further discussion 
related thereto is omitted for the sake of brevity. 

[0042] As the ACS process is performed, the chosen branches (e.g., path decisions) for each state at each stage 
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of the trellis are recorded. Thus, the optimal paths (e.g. , identified paths) to ach state are known. The decoder system 
110 output is then determined by traversing through the trellis in a reverse direction following the selected branches 
from above. After a certain distance, (known as the convergence distance), all of the identified paths from other trellis 
states will most likely have converged to a single path. At this point, valid decoded output bits may be derived from the 
traceback process which is described in more detail below. 

[0043] As shown in Fig. 7, the ACS unit 122 of this embodiment of present invention forms a cascade and consists"! 
of four ACS blocks 150b through 156b with groupings of delay registers 160a through 160f and cross switches 162 
through 164 between the blocks. Each ACS block performs a plurality of radix-2 or butterfly Add/Compare/Select oper- 
ations over one stage of the trellis. The ACS cascade 122 performs a radix-16 ACS operation over four stages of the 
trellis. Radix-N refers to the size of the sub-trellis that is operated upon. N refers to the number of states and must be a 
power of two. As will be described in more detail below, the trellis shown in Fig. 5, may applied to multiple states, up to 
256 per stage for K=9, by employing the trellis depicted in Fig. 5 as a sub-trellis over multiple states. Basic cascade 
structure operation may be referenced further in "Algorithms and Architectures for high speed Viterbi decoding", Ph D 
Dissertation, Dept. of Electrical Engineering, Stanford University, 1993, by Peter Black which is hereby incorporated by 
reference in its entirety. 

[0044] The radix-16 ACS operates on a 16 state trellis and computes new state metrics for a forward step of four 
stages in the trellis. The cascade implementation achieves this by computing the state metrics for each intermediate 
stage (two states per radix-2 ACS unit within ACS blocks 150b through 156b) and passing the accumulated state met- 
rics, with appropriate reordering (routing the outputs of the present stage to the correct inputs of the next stage) to the 
next cascade stage. The registers 160a-160f and cross switches 162-166 between the ACS blocks 150b through 156b, 
perform reordering as defined by the particular trellis stage. The cross switches either pass the data straight through or 
exchange the data on the two busses (shown as Bus A and Bus B) depending on which portion of the trellis is being 
operated upon. The cross switch settings may change at set rates during decoder 110 operation. 
[0045] For operation on convolutional codes with 256 states, the trellis can be considered to be composed of an 
interleaving of 1 6 subtrellises of size 1 6. Thus, these subtrellises are fed to the cascade ACS datapath 1 22 in a sequen- 
tial fashion, one after another. The correct nodes for each subtrellis are read from the SM memory 126, fed to the ACS 
datapath 122, then the results stored back into the SM memory 126. For all constraint length cases (K= 9 through 5), 
'in-place scheduling' is employed, thus only one copy of the state metrics are stored. In place scheduling refers to pre- 
vious state metrics being overwritten by new state metric results after the ACS computations have completed. 
[0046] The manner in which the 'in-place scheduled' trellis is partitioned into subtrellises has two phases, a phase 
A and a phase B, which repeat when moving forward through the trellis. For example, the 16 state trellis of Fig. 5 may 
be partitioned into subtrellises of size 4 over two stages. Phase A covers stage 1 and stage 2 wherein the subtrellises 
are interwoven. Phase B covers stage 3 and stage 4 wherein the subtrellises are separated and appear each one above 
another. This results in two distinct phases for generating memory addresses and state indices. 
[0047] As compared to more traditional approaches which may operate on only one stage of the trellis at a time, the 
radix-16 cascade approach of preferred embodiments of the present invention is more energy efficient. Traditional 
approaches require reading and writing all state metrics once per stage while a preferred embodiment of the present 
invention reduces this to once per four stages. Thus, power savings follow since memory I/O transactions consume 
large amounts of power. 

[0048] In order to provide more efficient traceback operations (discussed below), a novel method for achieving a 
partial pretraceback of length four is achieved during the cascade ACS operation. Pretraceback implies that a partial 
traceback has been performed for each trellis stage prior to storing the path decision information for later traceback 
completion. In accordance with an embodiment of the present invention, the system employed to implement pretrace- 
back is a combination of a unique register exchange (170a, 170b in Fig. 8) with extensions of the reordering hardware 
160a through 160f and 162, 163, and 164 which is located between the ACS blocks 150b though 156b. 
[0049] A pretraceback system is depicted in Figure 8 in accordance with an embodiment of the present invention 
The registers 166, which are part of the ACS data path, in the reordering structures are made wider depicted as (n) 
such that they can hold accumulating pretraceback paths. Registers 166 also provide a 10 bit accumulated state metric 
value from the previous stage. It is noted that before stage 150b in Fig. 7, n is equal to 0 bits since no trellis stages have 
been solved (ACS path determination) at this point. In stage 152b, n is equal to 1 bit since one stage has now been 
determined (path chosen), in stage 154b, n = 2 bits, and in stage 156b, n = 3 bits. After each ACS block 168a and 168b 
(top half of the ACS block 168a determines path for top butterfly node, and the bottom half ACS block 168b determines 
path for bottom butterfly node), an additional bit is provided to the next stage as a result of the ACS operations. 
[0050] The additional bit indicates which path was selected from the previous stage as a result of the ACS opera- 
tion. Bits 172a and 172b are provided as additional bits from the ACS blocks 168a and 168b to select (via mux's 170a 
and 170b) the selected paths 174a through 174d which have been carried forward (forwarding) from previous stages, 
to be appended to the selection path from the present stage. The appending function of the additional bits from the 
present stage to the chosen pretraceback bits from the prior stages (output of mux's 170a and 170b) is shown at refer- 
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ence numbers 176a and 176b. The register paths to the next stage are then represented as n+1 to indicate the accu- 
mulation of partial pretraceback bits which are carried forward to the next succeeding stage. 

[0051] The register xchange, described above for partial pretraceback, in combination with the cascaded ACS 122 
provides a unique d coding architecture for reducing memory accesses and reducing power consumption. The r gister 

5 exchange in conjunction with the cascaded ACS of an embodiment of the present invention updates traceback memory 
(described below) after determining paths over four stages of th trellis. This reduces traceback memory accesses by 
a factor of four. This substantially reduces power consumption and substantially increases decoder 110 performance. 
[0052] The cascade ACS structure 122 is also employed for determining codes of constraint lengths less than 9 
(fewer than 256 states). There are two embodiments for implementing this feature which provides flexibility for operating 

10 with various constraint length convolutional encoders. 

[0053] The preferred embodiment remains in harmony with the geometry of the trellis as in the manner of an in- 
place schedule (continually overwriting past state metric determinations with state metric determinations from the 
present stage). The geometry of the trellis, as shown in Fig. 5, repeats after a distance equal to the memory length (K- 
1). The constraint length codes are divided into two phases, which may not be symmetric, but such that the sum of the 

15 length of the two phases (number of stages) is always equal to the memory length. In particular, for K=8, 128 states, a 
phase I computation of radix-16 is determined over 4 stages of the trellis and a phase II computation of radix-8 is deter- 
mined over 3 stages of the trellis. Similarly, for K=7, phase I is computed over four stages of radix-16, and phase II is 
determined over 2 stages of radix-4. For K=6, phase I is computed over four stages of radix-16 and phase II is com- 
puted over one stage of radix-2. For K=5, phase I is computed over four stages of radix-16 and phase II is unnecessary. 

20 [0054] To operate the cascade ACS structure for a radix-8, 3 stage situation, for example, the first ACS stage 1 50b 
is bypassed and the metrics flow through the data path circuits 160a, 160b and 162, and are effectively fed into the 2 nd 
ACS block 152b. Analogously, for radix-4, 2 stage operations, data is effectively fed into the 3 rd ACS block 154b. For 
radix-2 operation, data is effectively fed into the last ACS block 156b. The bypass mechanism may be employed within 
the ACS block and may be any well known switching system (e.g., mux selected for bypass) for digitally routing accu- 

25 mulated state metric data through (around) the ACS block and reordering hardware without undergoing any computa- 
tional changes. In an alternative embodiment, the ACS blocks and data path may be completely bypassed prior to the 
first ACS block required for computation. The radix-8 case, for example, would require bypassing circuits 150b, 160a, 
160b and 162 which are shown in Fig. 7. 

[0055] An advantage of the preferred embodiment is that generating state indices or memory addresses is rela- 
30 tively straight forward. It is to be appreciated that the ordering needed for each constraint length case may be expressed 
as an instance of a more general ordering algorithm. For example, unique address generators may be designed for this 
embodiment. An alternative embodiment operates by always performing radix-16 operations over 4 stages of the trellis, 
provided K>4. However, address generation becomes more involved. 

[0056] Each ACS block 1 50b through 1 56b in the cascade datapath operates on a single trellis stage until all states 
35 have been processed. This implies that the set of branch metrics 134 for a given stage are provided to the State Metric 

Update Unit 120 for use in the associated ACS block. However, each butterfly operation in an ACS block requires a par- 
. ticular branch metric from the current data set and the particular branch metric is to be determined and selected. The 

branch metric selection depends upon the trellis state index and a user supplied code polynomial. 

[0057] The preferred embodiment of the present invention provides two equivalent hardware blocks of equal size 
40 and virtually identical structure for providing the appropriate branch metrics to the ACS blocks 150b-156b. One BM 

selection unit 138 serves the first two ACS blocks 150b and 152b, and the other BM selection unit 138 serves the last 

two ACS blocks 154b and 156b. 

[0058] Fig. 9 depicts the general structure of one such branch metric selection unit 1 38. Each BM selection unit 138 
consists of a state index generator 202 which provides state indices for the trellis and a branch metric index block 204. 

45 It is to be appreciated that the state index generator 202 may be considered as part of the address and control block 
136 described earlier in Fig. 6. The branch metric index block 204 produces a first set of BM indices 206 and a second 
set of BM indices 208, one for each ACS block 210 and 212. The second set of indices 208, is fed through a chain of 
delay registers 214 which causes the indices to arrive at a BM selection multiplexor 216a at the correct time (synchro- 
nized to the ACS computation). The delays 214 follow the delay through the ACS cascade 210 and 212 and associated 

so reordering hardware described above. 

[0059] The BM selection mux's 216a and 216b utilize the indices from the branch metric index logic 204, and select 
the correct BM from a branch metric holding register 218a and 218b, from the set of branch metrics for the ACS stage. 
One bit 220a and 220b of the indices is also fed to the ACS blocks 210 and 212 and denotes the sign of the particular 
branch metric. As a result, only half of the branch metrics are stored and transported. It is to be appreciated that other 

55 convolutional coding schemes may be employed that may require more branch metrics to be stored and are thus con- 
templated by the present invention. 

[0060] The state index generators (one generator not shown from the other 1/2 branch metric selection unit) create 
identical sequences representing the sequence of state indices for the trellis states fed into the ASC cascade datapath. 
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The second state index generator (not shown) provides a delayed start relative to the first state index generator 202 to 
insure proper time alignment with the cascade states 3 and 4. The BM index block 204 employs ach state index 
together with th code polynomials (in a manner similar to how the convolutional encoder generates output bits) to gen- 
erate each branch metric index. 

5 [0061] It is to be appreciated that the state sequence order is different in each ACS cascade stage and thus addi- 
tional operations must be done in cascade stages 2, 3 and 4. In effect, the correct state index for each ACS stage is 
derived from the incoming index. The incoming index is generated as the first cascade stage sequence, but is directly 
related to the required index due to the reordering of the cascade structure which follows the geometry of the trellis. 
Thus, the required next stage indices may be derived by shifts of the incoming indices with the proper appending of O's 

io and 1*s to fill the new empty slots (state metric column addresses). The proper set of O's and ts is defined by known 
trellis connections between the states and the state index position in the trellis. A counting mechanism 222 may be 
employed to provide appended bits and determine when to utilize different sets of indices for any particular cascade 
stage. For cascade stage 2, for example, one bit is appended, for stage 3, two bits are appended, and for stage 4, three 
bits are appended. 

15 [0062] To further illustrate state index generation of the present invention, and referring back to the trellis of Fig. 5, 
state indices for stage 2 from those of stage 1 are described in more detail. Related to the properties of the trellis's but- 
terfly structure, only one state index per butterfly is required. Also, note that when an index or node is referred to with 
regard to a particular stage, the index or node is referring to the left side of the stage. 

[0063] The lower butterfly indices of column C1 are initially generated for stage 1 which produces the sequence of 

20 8 indices: 8, 9, 10, 11, 12, 13, 14, and 15. To produce the state indices for stage 2, in the correct order of butterflies (/.e., 
top to bottom), observe the first four butterflies in stage 2 which begin atop column C2. Note that the top node of each 
of these indices in column C2 connects to a lower node in stage 1 which are those of the first four butterflies in stage 1 . 
Specifically, the butterflies with indices 8, 9, 10 and 11. These indices are directly manipulated to produce the first four 
indices for stage 2 as follows: Interpret the indices as 4 bit numbers; each index is then shifted leftwards; the most sig- 

25 nificant bits are dropped; and the least significant bit becomes '0'. This mimics the action of the convolutional encoder 
when at any of these nodes and given a '0' for an input bit. The resulting four indices are 0, 2, 4 and 6 in column C2. 
■ Thus, indices for the first four butterflies of stage 2 are produced for the top nodes. 

[0064] Now turning to the last four butterflies of stage 2 in fig. 5, the bottom nodes connect to the bottom nodes of 
the last four butterflies of stage 1. Thus, the indices 12, 13, 14 and 15 from stage 1 are left shifted, the most significant 

30 bit dropped, and the least significant bit is set to T. This produces the indices, 9, 11, 13 and 15 of column C2 by again 
mimicking the convolutional encoder and assuming an input bit equal to '1* The same process may be repeated for 
stage 3 and stage 4. It can be shown that the appended bits (least significant bits above) follow a set pattern. Specifi- 
cally, a counting pattern from top to bottom. In the example above, the pattern is a '0' then '1', each for four consecutive 
butterflies. Forthe next stage, the pattern becomes: "00", "01", M 10 M , and "11", each for two consecutive butterflies. 

35 [0065] Generating a branch metric state index is analogus to generating a set of output bits for a given state in the 
convolutional encoder 44 shown in Fig. 3, for example. For branch metric indices however, some choice for a hypothet- 
ical input bit is required in addition to either the upper or lower state index of the butterfly. The input bit may simply be 
set to '0' when the state index is for the top butterfly node and to T for the opposite case. In addition, from the above 
discussion, it can be seen that the input bit is identical to the least significant bit of the state indices that were created 

40 from previous stage's indices since a '0' is appended to get to a next stage's top node and a '1' to get to a next stage's 
bottom node. 

[0066] Referring back to Fig. 4, if node 68a is the index with a most significant bit of *0' and node 68c is the index 
with a least significant bit of '0', then the branches 70a and 70b can only be traversed with an input bit of *0' and 
branches 70c and 70d traversed with an input bit of '1'. Further, it is well known that the output bits from branch 70a will 
45 be identical to those of branch 70d. It is this property that is utilized above to allow either the butterfly's top index 68a or 
bottom index 68b to be utilized for generating a branch metric index, provided the hypothetical input bit is set to select 
the horizontal branches in each case. 

[0067] Referring again to Fig. 9, there are various embodiments in which the state sequence order operation 
described above may be implemented. In a preferred embodiment, arbitrary code polynomials are provided in which 

50 case a code polynomial register 224a-224c may hold the polynomial. Bit by bit, each polynomial may be logically 
ANDed with the derived state index from above and the resulting bits EXCLUSIVE-ORed together to form one bit of 
branch metric index (similar to convolutional encoding). Alternatively, if only a certain number of different codes are to 
be implemented, hardware may be provided specifically for the codes. This may be accomplished by writing out the 
logic equations for each code polynomial for the branch metric indices (bit by bit), in terms of the state index, counting 

55 index, and fixed code polynomial. Logic synthesis may then be employed to produce a compact representation. In either 
case, the same hardware may handle codes of different constraint lengths by logically cutting off (e.g., shifting, rotating) 
unneeded, higher order address bits. It is to be appreciated that the number of code polynomials may vary depending 
on the code rate. The embodiment depicted in Fig. 9 operates on the code rates of 1/2 and 1/3. It is furtherto be appre- 
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dated that the invention as disclosed herein may be applied to a plurality of other code rates (e.g., 1/4, 1/5, tc). 
[0068] In addition, a single state index generator 202 may be employed to feed a branch metric index block that 
computes all four branch metric indie s, each index may then be sent to the respective branch metric selection mux 
through the necessary delay chain. Also, a generator for each ACS block may be employed with the proper delay start 

5 time, thereby eliminating the delay registers. 

[0069] Referring back to Fig. 6, the Traceback Unit 130 will now be described in greater detail. Th main function 
of the Traceback Unit 130 is to perform traceback operations on the partial pretraceback decisions which have be n 
stored in the traceback memory 132 during the ACS operations described above. The Traceback Unit 130 also per- 
forms the functions of accumulating and storing the decoded output data, and serves as buffer storage for the branch 

10 metrics on their route from the DSP 140 to the State Metric Update Unit 120. 

[0070] The traceback operation consists of traversing backwards through the path decision data following a path 
that each new data item helps to construct. Once enough steps of the traceback process have been accomplished (con- 
vergence distance), decoded output bits may be accumulated, which are derived from the same data. The basic trace- 
back operation is well known. The core of the Traceback Unit 130 is constructed as a direct implementation of the Viterbi 

15 algorithm as directed by the manner in which path decisions are stored in memory 132. The traceback operation of 
embodiments of the present invention is unique in that it operates on various constraint length codes and with various 
lengths of pretraceback. Additionally, the Traceback Unit 130 simultaneously provides decision storage, traceback, 
decoded output storage and branch metric storage. The various storage requirements may be provided by one or more 
memory circuits. 

20 [0071] For traceback operation with a pretraceback of length four, for example, the most common situation, a 32 bit 
word, which is described in more detail below, is read from the traceback memory 132 and the lower bits (least signifi- 
cant) of the state index shift register (not shown) are employed to select four bits from the 32 bit word. The four bits are 
the next pretraceback item that is needed and immediately becomes part of the next state index because these items 
have already been traced backwards. A portion of the MSBs of the state index are utilized together with a circular count 

25 variable to form the memory address. 

[0072] When performing traceback with pretraceback lengths less than four, or codes of smaller constraint length, 
the hardware is constructed to select only the length needed from the memory word. The number and position of bits 
from the state index register are also adjusted accordingly. In this manner multiple codes are enabled. 
[0073] Once the convergence length (survivor path distance) has been passed in the traceback process, the 

30 decoded output bits may be accumulated. The decoded output bits come directly from the selected portion of the mem- 
ory word that is provided to the state index. These bits come in groups of identical size to the pretraceback length, and 
are accumulated into a 32 bit register and stored in the traceback memory 132 as needed. 

[0074] The pretraceback decision bits arriving from the SM Update Unit 120 are also accumulated into a 32 bit reg- 
ister before being written into the traceback memory 132. The information is arranged sequentially within the register 
35 and in the memory 132 such that portions of the constructed state index may be utilized as pointers to the information 
as discussed above. 

[0075] The Traceback Unit 1 30 controls all. the I/O. operations and buffer management concerning the traceback 
memory 132. Decoded output storage, branch metric storage and decision data storage all require circular buffers. 
[0076] Now referring to Fig. 10, a more detailed schematic block diagram of the Traceback unit 130 is shown. In par- 

40 ticular, Fig. 10 depicts how the Traceback Unit stores decision data 124 (i.e., partial pretraceback data) which is 
received from the cascaded ACS 122 illustrated in Fig. 6. An I/O memory 132a is included for storing decoded output 
words 300 and providing output words 300a to an external DSP 140, shown in Fig. 6. The I/O memory 132a also stores 
incoming Branch Metric words 310 from the DSP 140, and provides the appropriate branch metrics words 134 to the 
State Metric Unit 120. The Traceback Unit 130 also performs and fully controls all of the traceback operations and pro- 

45 vides address generation and FIFO management as will be described in more detail below. 

[0077] A plurality of multiplexors are shown in Fig. 10 for Traceback Unit 130 operations. Some of these are stand- 
ard multiplexors in that they only choose as an output one of the input bit vectors that are shown. However, other mul- 
tiplexors are generalized multiplexors. These have complex descriptions and will choose the output, from the inputs, by 
following a custom choice of inputs, which depends upon the control bits to the multiplexors. That is, the multiplexors 

so may consider all the input vectors as individual bits grouped together and can choose any of these bits in any order as 
the selected output bits. These are built by providing particular definitions for bit choices which depend upon control bits 
within a VHDL process. This is then synthesized, usually into a layered structure, of traditional multiplexors. 
[0078] A Traceback Memory Mux 312 is a standard multiplexor and chooses either the decision store address 314 
or the Traceback address 316 to present to the traceback memory 132. A Decision Mux 318 is a generalized multiplexor 

55 and chooses decision input vectors and some feedback bits such that 8 bits of decisions 318a and 318b are placed into 
non-overlapping positions within a 32 bit register 320. The effect is that the vectors are stacked in order of arrival in the 
register 320 and stored in the m mory 132 after 32 bits or 4 vector sets have arrived. An I/O Memory Data Mux 322 is 
standard multiplexor and selects output words 300 or branch metric words 310 for storage in the I/O memory 132a. 
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[0079] A Traceback Mux 324 is a standard Mux and considers a 32 bit input vector 324a as B vectors of 4 bits each, 
all in linear order. The mux 324 chooses one of th 4 bit vectors (318a or 318b that were previously stored in a word) 
as output 324b. Each of the 4 bits 324b are a partial pretraceback path segment (i.e., one of the 4 bit decisions that was 
stored previously, though at times only one, two, or three bits may be invalid because of bypassing as discussed above). 
The four bit partial pretraceback decisions 324b are then routed through standard mux's 326a, 326b and 326c. 
[0080] A generalized mux 328 selects the correct 3 bits out of the 6 input bits 328a and 328b to use in the next cycle 
for choosing a correct portion of the traceback word. These 6 input bits 328a and 328b represent a portion of a present 
state index in the traceback process. The mux 328 selection depends upon the convolutional code's constraint length 
and how many stages of the trellis are being operated upon. 

[0081] A generalized mux 330 selects out of the 8 bits of the present state index 330a or 330b, and selects a set of 
4 bits that will be decoded output bits. However, at times, there may only be 3, 2, or 1 bits that are valid depending on 
the constraint length. 

[0082] A generalized mux 332 effectively stacks valid decoded output bits 336 into an accumulation register 338. 
When 32 valid bits are stored in the accumulation register 338, they will be sent to the I/O memory 1 32a via generalized 
mux 340. Mux 340 selects the correct 32 bits which are valid decoded output bits out of 35 bits in the accumulation reg- 
ister 338. The selection depends upon constraint length and number of trellis stages. 

[0083] A generalized Traceback Address Mux 342 forms the lower portion of the traceback address 342d for the 
next traceback word 324a. The mux 342 selects the correct address bits from 9 bits of input from 3 different vectors 
342a, 342b and 342c. Again, the selection depends upon the constraint length and number of stages. The complete 
traceback address is constructed by concatenating the lower address bits 342d with the higher bits 342e of a traceback 
pointer 342f which is shown as an arrow feeding lines 342a and 342e. This pointer 324f comes from a counter (not 
shown) within the traceback controller 334 and provides for moving backwards through the traceback memory 132 to 
achieve traceback. A portion of the traceback pointer bits 342a may also be used to form the lower address bits 342d 
via mux 342 as necessary for various constraint length codes. 

[0084] The traceback controller 334 contains all the logic for controlling the Traceback Unit 130, includes numerous 
counters, registers and multiplexors which are necessary for controlling the operations described previously, and for 
controlling address generation and memory data management. 

[0085] Now referring to Fig. 11, a methodology for a Viterbi decoding system is shown in accordance with an 
embodiment of the present invention. At step 400, a plurality of ACS operations are performed over a plurality of ACS 
stages via a cascaded ACS unit 122 as described above (see, e.g., Fig. 7). Proceeding to step 410, path decisions are 
determined for all branches entering a stage of the ACS 122 as a result of the ACS operations of step 400 (see, e.g., 
Fig. 7, Fig. 8). Proceeding to step 420, path decisions are accumulated during the ACS operations by widening the data 
path of the ACS (e.g., appending path decision bits to the ACS data path, see, e.g., Fig. 8). 

[0086] At step 430, accumulated path decisions are forwarded to succeeding ACS stages (see, e.g., Fig. 8). This 
may be accomplished, for example, by routing the accumulated path decisions to the succeeding ACS stages based 
upon identified path decisions of the succeeding ACS stage (e.g., path decision bit of succeeding stage selects accu- 
mulated path from previous stage via mux circuit). The accumulated path decisions are then combined with the path 
decisions of the succeeding ACS stage via the mux circuit (see, e.g., Fig. 8). 

[0087] At step 440, a set of accumulated path decisions over a plurality of ACS stages are provided to the traceback 
memory. For example, the accumulated path decisions may be appended to the path decisions of succeeding ACS 
stages. The appended path decisions are then stored in the widened ACS data path (see, e.g., Fig. 8). After completing 
step 440, the process proceeds back to step 400 whereby more decoding operations may be performed. 
[0088] Although certain preferred embodiment or embodiments of the invention have been shown and described, 
it is obvious that equivalent alterations and modifications will occur to others skilled in the art upon the reading and 
understanding of this specification and the annexed drawings. In particular regard to the various functions performed 
by the above described components (assemblies, devices, circuits, etc.), the terms (including a reference to a "means") 
used to describe such components are intended to correspond, unless otherwise indicated, to any component which 
performs the specified function of the described component (i.e., that is functionally equivalent), even though not struc- 
turally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary embodi- 
ments of the invention. In addition, while a particular feature may have been disclosed with respect to only one of 
several embodiments, such feature may be combined with one or more other features of the other embodiments as may 
be desired and advantageous for any given or particular application. 

[0089] The foregoing describes embodiments of the invention utilising a programmable DSP system comprising 
programmable logic devices. In an embodiment of the invention the programmable logic devices may be configured by 
a computer program to operate in accordance with the invention. Such computer program may be supplied and stored 
in suitable storage media such as disk, tape or solid-state memory. Optionally, the computer program may be supplied 
remotely via an on-line connection to a suitable sewer for downloading the program to the devices, for example. 
[0090] The scope of the present disclosure includes any novel feature or combination of features disclosed therein 
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either explicitly or implicitly or any generalisation thereof irrespective of whether or not it relates to the claimed invention 
or mitigates any or all of the problems addressed by the present invention. The applicant h reby gives notice that new 
claims may be formulated to such features during the prosecution of this application or of any such further application 
d rived therefrom. In particular, with reference to the appended claims, features from dependent claims may be com- 
5 bined with those of the independent claims and features from respective independent claims may be combined in any 
appropriate manner and not merely in the specific combinations enumerated in the claims. 

Claims 

10 1 . A decoder system, comprising: 

a State Metric Update unit including a state metric memory and a cascaded Add/Compare/Select (ACS) unit, 
wherein the cascaded ACS unit comprises a plurality of serially coupled ACS stages for performing a plurality 
of ACS operations in conjunction with the state metric memory, 
15 wherein an ACS stage is operable to identify a plurality of path decisions and communicate the identified path 

decisions to a next ACS stage coupled thereto; and 

a Traceback unit for storing a set of accumulated path decisions in a traceback memory associated therewith, 
and performing a traceback on the set of accumulated path decisions, 

wherein the path decisions associated with the ACS stage and the next ACS stage are accumulated as a set 
20 during the ACS operations before being written to the traceback memory, thereby minimizing accesses to the 

traceback memory. 

2. The decoder system of claim 1, wherein the identified path decisions are accumulated as a set by forwarding the 
identified path decisions from the ACS stage to the next ACS stage during ACS operations. 

25 

3. The decoder system of claim 1 or 2, wherein the State Metric Update unit further comprises a branch metric selec- 
tion unit operable to select and synchronize a set of branch metric values with a set of state metric values during 
ACS operations. 

30 4. The branch metric selection unit of claim 3 further including a delay circuit for synchronizing the set of branch met- 
rics with respect to a plurality of ACS stages. 

5. The branch metric selection unit of claim 3 or 4, further including a state index generator for providing state metric 
memory addressing for identifying state metric values for stages of the cascaded ACS unit and state indices for 

35 branch metric selection. 

6. The branch metric selection unit of any of claims 3 to 5, further including a set of code polynomial registers for stor- 
ing user supplied code polynomials and enabling flexible operations on a plurality of convolutional codes. 

40 7. The branch metric selection unit of claim 6 further comprising: 

a branch metric index logic circuit for receiving the user supplied code polynomials, wherein the branch metric 
index logic receives a state index and an auxiliary counter to produce branch metric indices for multiple ACS 
stages and to select a correct branch metric. 

45 

8. The decoding system of any preceding claim, further including a DSP interface circuit operatively coupled to both 
the State Metric Update unit and the Traceback unit, wherein the DSP interface circuit communicates the branch 
metrics from a DSP to the Traceback unit and transmits decoded output bits from the Traceback unit to the DSP. 

so 9. The decoding system of claim 8 wherein the DSP performs branch metric computations and depuncturing thereby 
facilitating flexible operations of the decoding system. 

10. The traceback unit of any preceding claim, wherein traceback operations are performed on a plurality of constraint 
length codes. 

55 

1 1 . A method for Viterbi decoding comprising the steps of: 

performing Add/Compare/Select (ACS) operations over a plurality of ACS stages; 
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determining path decisions during the ACS operations; 
accumulating the path decisions based upon the ACS operations; 

forwarding the accumulated path decisions to a succeeding ACS stage of the plurality of ACS stages; and 
providing the accumulated path decisions as a set to a traceback memory to reduce traceback memory 
accesses. 

12. The method of claim 11, wherein the step of accumulating path decisions further comprises the step of: 

widening an ACS data path for receiving path decisions based upon the ACS operations. 

13. The method of claim 11 or 12, where in the step of forwarding the accumulated path decisions further comprises 
the steps of: 

routing the accumulated path decisions of a succeeding ACS stage based upon identified path decisions of a 
present ACS stage; and 

combining the identified accumulated path decisions with the path decisions of the succeeding ACS stage. 

14. The method of any of claims 1 1 to 1 3, wherein the step of providing accumulated path decisions as a set to a trace- 
back memory further comprises the step of: 

appending the accumulated path decisions to a set of path decisions associated with the succeeding ACS 
stages. 

15. A computer program comprising computer-implementable instructions for configuring a computer to operate in 
accordance with the method of any one of claims 11 to 14. 

16. A computer program as claimed in claim 15, embodied on a computer-readable-medium. 
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(54) Flexible viterbi decoder for wireless applications 

(57) A Viterbi decoder system is provided in accord- 
ance with the present invention. The decoder system 
includes a State Metric Update unit including a state 
metric memory and a cascaded Add/Compare/Select 
(ACS) unit. The cascaded ACS unit comprises a plural- 
ity of serially coupled ACS stages for performing a plu- 
rality of ACS operations in conjunction with the state 
metric memory An ACS stage is operable to identify a 
plurality of path decisions and communicate the identi- 
fied path decisions to a next ACS stage coupled thereto. 
A Traceback unit is provided for storing a set of accumu- 
lated path decisions in a traceback memory associated 
therewith, and performing a traceback on the set of 
accumulated path decisions. The path decisions associ- 
ated with the ACS stage and the next ACS stage are 
accumulated as a set during the ACS operations before 
being written to the traceback memory, thereby minimiz- 
ing accesses to the traceback memory. 
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